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Abstract 

Random input patterns induce a partition of the coupling space 
of feed-forward neural networks into different cells according to the 
generated output sequence. For the perceptron this partition forms 
a random multifractal for which the spectrum f{a) can be calculated 
analytically using the replica trick. Phase transition in the multi- 
fractal spectrum correspond to the crossover from percolating to non- 
percolating cell sizes. Instabilities of negative moments are related to 
the VC-dimension. 

PACS numbers: 02.50.-r, 64.60.Ak, 87.10. +e 

Multifractal concepts were originally introduced in the context of devel- 
oped turbulence and chaotic dynamical systems 0| and have become since 
then a standard tool to analyze physical systems with richer structure than 
that induced by dilation symmetry alone (for reviews see 0). Contrary to 
simple scale invariant situations as provided, e.g., by systems at second or- 
der phase transitions that can be classified with the help of a few critical 
exponents only the description of multifractals reqires a full range of scal- 
ing exponents specified by a continuous function f{a). The reason for this 
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multitude of exponents is the fact that different moments of the underly- 
ing probabihty distribution are dominated by different fractal subsets of the 
system. 

The simplest examples of multifractal measures are provided by determin- 
istic recursive constructions as the two-scale Cantor set 0. However, many 
multifractals observed experimentally as, e.g., in diffusion limited aggrega- 
tion, are generated by random processes. It is therefore of general interest to 
analyze simple models of random systems that exhibit multifractality [^. A 
particularily simple case is provided by fractals on which a measure of con- 
stant density is distributed. The multifractal properties are then of purely 
geometrical origin and characterize the fractal support itself 0. 

In the present letter we show that the coupling space of simple feed- 
forward neural networks storing random input-output mappings displays 
multifractality. The corresponding spectrum /(a) can be determined ex- 
plicitely using the replica trick. On the one hand these systems may hence 
serve as examples to test the properties of random multifractals. On the 
other hand the multifractal analysis of the cell structure imposed on the cou- 
pling space by the random input-output mappings refines and extends the 
standard statistical mechanics analysis of the storage and generalization 
properties [0] of these systems. 

We consider a perceptron with input bits C,i = ±1,2 = 1,...,N, and 
one output a = ±1. The output is given as the sign of the scalar product 
between the input and the couphng vector J of the perceptron, i.e. 



We will mainly consider the case in which the J- vector is binary, J, = ±1, 
(Ising-perceptron) and defer the discussion of continuous couplings (spherical 
perceptron) to the end of this letter. 

Given p = jN different input patterns we can associate with every 
pattern a hyperplane perpendicular to it that cuts the coupling space of 
the perceptron into two halfs according to the possible output a'^ = ±1. 
If the inputs are generated independently at random we will hence find a 
random partition of the coupling space into at most 2^ cells. These cells 
can be labeled by their output sequences cr = {a^} and their size gives the 
probability -P(cr) that, for given input sequence {^^}, the outputs cr are 
generated by a randomly chosen coupling vector J. It is well known that 
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the storage and generalization properties of the neural network are closely 
related to this probability distribution ||, |l^, |ll . 



The natural scale for the cell sizes in the thermodynamic limit N oo 
is e = . Due to the random orientation of the hyperplanes, however, 
the cells will differ significantly in size from each other. To describe these 
fluctuations quantitatively we introduce the crowding index a(cr) by 

P(o-) = e"^'^) (2) 

and characterize -P(cr) by its moments 

cr 

with the mass exponent T(g). As usual |l|, 0] the multifractal spectrum /(a) 
is given by the Legendre transform of T(g): 

f{a)=mm[aq-T{q)] (4) 

and J\f{a) = e^^^"^ gives the number of cells of size e". 

For large we expect that r and / become self-averaging, i.e. they will 
no longer depend on the choice of the random inputs We can therefore 
calculate r(g) from 

where ((...)) denotes the average over the distribution of the input patterns 
which we take as 

na = n[^^(e + 1) + Im - 1)] (6) 

The calculation of r(g) uses a variant of the replica trick introduced recently 
by Monasson and O'Kane |]12| and will only be sketched. Starting with the 
definition of -P(cr) 

n^) = E n oi^'je) (7) 
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with the theta-function 6{x) = 1 if x > and 6{x) =0 else we introduce one 
rephca index a running from 1 to g in order to represent the g-th power of P 
and another rephca index a running from 1 to n to represent the In in eq. (|^) 
in the usual way [|13]. We then find 



r{q) 



lim 
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lim — 



((E E n o{air-e))) - 1 



(8) 



Next the average is performed and the resulting expression is written as a 
saddle-point integral over the elements of the overlap matrix 



Q 



ab 



aa jbfS 
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and its conjugate Qaf • To solve the remaining extremalization problem we 
assume replica symmetry (RS) which in the present problem is given by the 
ansatz IIT2| 



Q 
Q 



ab 
ab 



Qi if a = b,a P 
Qo if a^b 



(10) 



and similarity for Q^b ■ Hence Qi denotes the typical overlap between cou- 
plings in the same cell (same output vector cr) whereas Qo characterizes the 
overlap between couplings in different cells. The reliability of the RS-ansatz 
will be discussed below. 
In this way we get 



^(l-(l-g)gO- yQogo 
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DzqIu. J Dzicosh.''{y Qi — QqZi 
Dto\n2 f DtiH 



'Qoto \ 



where Dx = exp(— x^/2)/A/27r and H{y) = Dx. Inspection of the saddle- 
point equations resulting from ([TID reveals that these are always fulfilled for 
Qo = Qo = 0. Physically this is due to the symmetry (J, cr) {—J, — c) in 
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eq.([l|) which ensures that to every cell there is a "mirror" -cell of equal size. 
As a consequence eq.(|ll|) simplifies to 
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(12) 

In J Dzcosh'^i^iz) --f\n2 J DtH^ 



Extremizing this expression with respect to Qi and Qi numerically and using 
eq.(^) we find /(a) as shown in the figure for different values of the loading 
parameter 7. Also shown are results from exact enumerations up to = 30. 
The inset gives a finite size analysis demonstrating very good agreement 
between the analytical results and the extrapolation from the numerical data. 

For small values of 7 the /(a)-curves have the typical bell-shaped form. 
The two zeros aminil) and amaxil) specify the largest jl^ and the smallest 
cell occuring with non-zero probability respectively. Moreover 00(7) =argmax/(a) 
defines the size of the typical cell and /(ao) = 7 indicates that for small 7 
all possible cells do indeed occur. Finally, from the normalization of -P(cr) 
we find for all 7 that r(g = 1) = 0. This implies that the /(a)-curves are all 
tangent to the line f = a. We denote the abcissa of the tangential point by 
01(7). Cells of size e"^ contribute most to the coupling space. 

For larger values of 7 it becomes important that due to Jj = ±1 the 
coupling space is discrete and the cells sizes must always be multiples of e. 
Values of a larger than 1 thus correspond to empty cells. For ao = 1 the 
typical cell is empty and hence ao(7c) = 1 determines the storage capacity 
7c [0. From the figure we infer 7c = .833. Similarily ai = 1 indicates that 
the coupling space is dominated from cells containing a single J-vector only. 
Therefore ai(7g) = 1 defines the threshold to perfect generalization |Tl| and 
the results shown in the figure yield 7^ = 1.245. 

Both the values for 7c and 7^ have been derived previously |]r3| and in 



fact a (7) = is similar to the zero-entropy condition used to derive them. 
However, it should be emphasized that the justification of the zero-entropy 
condition within the traditional Gardner-approach requires one-step replica 
symmetry breaking (RSB). Contrary, in the present approach not only RS 
already gives the correct results but anticipating that Qo = Oo = for 
«! < a < cto (see below) we can even go without the replica index a and 
calculate r(g) as an annealed average over the input distribution (0). This 
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is technically much simpler than a one-step RSB calculation and may open 
the way to a more detailed analysis of multilayer networks also . 



We now turn to the rehability of the RS results with Qq = 0. First 
of all we note that from (|D and (H) /(a) is the entropy of the discrete 
spin system cr (with hamiltonian a{cr)) and must therefore be non-negative. 
However, before / becomes negative we find for both positive and negative 
q a transition to a saddle point with Qq,Qq > at values of q = q± given by 

V^=\q±- i|Qi(i - Qi)(i - (1 - q±)Qi) (13) 

Since Qq gives the typical overlap between J-vectors belonging to different 
cells the analogy with the spin glass problem suggests that Qo > signals 
broken ergodicity. This means in the present context that moments of -P(cr) 
with g > g+ or g < g_ are dominated by cells that no longer percolate in 
coupling space. Whereas for g+ > g > g_ all dominating cells can be reached 
from one another without entering cells of a different size this is no longer 
true for the remaining values of q. Note that the transition occurs always 
outside the interval (ai, ao). 

We have also investigated the transversal stability of the RS saddle point 



using standard techniques [|17|. We find that at g = g± the RS saddle point 
becomes unstable also with respect to RSB and this instability is not removed 
by <5o > ||18[. On the other hand we believe that our qualitative picture of 
a percolation transition at gc remains valid also in a RSB solution. 

We have obtained similar results for the spherical perceptron (see also 
[Q). For small values of 7 the /(a)-curves are almost identical to those of 
the Ising-perceptron. Since there is now no smallest possible size of a cell 
the storage capacity 7c = 2 ^ has to be determined from ao(7c) 00. 
For 7 > 7c the spectrum f{a) is hence monotonously increasing and the 
asymptotic value of / for a 00 remains smaller than 7 since a growing 
fraction of classifications cannot be realized. The generalization properties 
of the spherical perceptron are characterized by the information dimension 
f{ai) of the multifractal cell structure which is related to the volume V of the 
version space by V = e-^''"^^ The longitudinal and transversal instability 



of the RS solution with Qo = occurs again at the same values g-t of g given 
now by = |g± - l\Qi. 

Finally we find from the RS results for both the Ising and the spherical 
perceptron an instability (P^) — > 00 of negative moments if g < gc where 
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qc = 1 — 1/a for a < 1 and gc = for a > 1. The corresponding endpoint of 
the /(a)-curve for 7 = .4 is marked by a dot in the figure, for 7 = .2 it occurs 
for / < 0. These divergencies are due to the possibihty of empty cells with 
P(cr) = 0. For small values of 7 we find qc < g_ and the instability lies outside 
the region of validity of RS. In this case it is therefore necessary to include 
RSB to elucidate the nature of this divergence and we speculate that this is 
due to the fact that empty cells can only occur due to very rare realizations of 
the input patterns [|^. For 7 larger than a threshold value 7^*", however, 
the instability occurs within the region of validity of RS. Then the probability 
of empty cells can no longer be exponentially small in A^. The instability of 
negative moments is hence related to the Vapnik-Chervonenkis dimension of 
the neural network, more precisely 7^*" determined from qdpi'^'") = Q-i'J^'") 
gives an upper bound on the VC-dimension dye ^3- For the spherical 
perceptron we find in this way the well known exact result 7^*" = 1 For 
the Ising perceptron we get similarily 7^*^ = 0.557. The VC-dimension of 
the Ising perceptron is not known up to now. One can show that it is at 
least 0.5 |^ and there is numerical evidence that it is indeed 0.5 for large N 



2^ . Our RS upper bound is somewhat larger which may indicate that there 
is a discontinuous transition to RSB for the Ising perceptron. This question 
is currently under study. 

In conclusion we have shown that a multifractal analysis of the phase 
space of neural networks allows to refine the standard statistical mechan- 
ics approach to learning and generalization substantially. The instabilities 
found in the spectrum /(a) can be related to physical properties of these sys- 
tems. We hope that these results may serve as a guide to understand similar 
transitions in other examples of random multifractals. Moreover it would be 
interesting to see whether there are other examples for percolation transi- 
tions in high- dimensional spaces that can be detected from a bifurcation of 
an overlap parameter Qq. 
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Figure caption 



Multifractal spectrum f{a) characterizing the cell structure of 
the coupling space of an Ising-perceptron classifying 'jN random 
input patterns for 7 = 0.2,0.4,0.833, and 1.245 (from left to 
right). The diamonds mark the region of vahdity of the replica 
symmetric ansatz. The dot denotes the location of the instability 
of negative moments for 7 = .4 (see text). The histograms are 
exact enumeration results for A'" = 30,p = 6; = 30,p = 12 and 
= 24, p = 20 respectively. The inset shows a finite size analysis 
of /(cKo) for 7 = 0.5. The correct value for A" —> 00 is 0.5, the 
line describes the first correction to the saddle point for N < 00, 
and the statistical error of the numerical results is smaller than 
the symbol size. 
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